Video signal processing for bandwidth reduction

ABSTRACT

At the transmitter a coder includes a motion vector generator providing vectors (MV) describing the movement of individual blocks of pixels. The video signal is 4:1 compressed in bandwidth by pre-filters and a sub-sampling unit to produce a signal (SSH) from which a high-definition image can be re-constructed in the coder and in the decoder. The sampling lattice is shifted in accordance with the motion vectors (MV) which are digitally transmitted along with the compressed bandwidth analogue signal, to enable the samples to be correctly located in the reconstructed image. Poorly correlated moving areas are handled by pure spatial filtering (pre-filter and sub-sampling unit) with reconstruction by spatial interpolation. The two reconstructed signals (RVH and RVL) at the coder are compared in a mode selector with the input video and a switch is set to transmit whichever of the compressed signals (SSH, SSL) gives the best match. A mode signal is also transmitted and, at the receiver decoder this operates a switch to select (RVH&#39;) or (RVL&#39;) correspondingly for feeding to the display. In a modified embodiment the sub-sampling unit does not employ a moving lattice but all samples for a four field sequence are taken from one field of each sequence.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to methods of coding and decoding videosignals, to enable the bandwidth of the transmitted (or recorded) signalto be reduced. Although the invention is described in detail withreference to the European 625 line, 50 field/s interlaced standards, theinvention is not restricted to any particular standards and may be usedwith both interlaced and non-interlaced (sequential) systems. Wheneverthe terms transmitter and receiver are employed (or analagous terms),the terms recorder and playback machine may be understood as unwrittenalternatives. The invention is of particular utility in transmittingHDTV (high definition television).

2. Description of the Related Art

1. Chiariglione, L., Corgnier, L. and Guglielmo, M.Pre- andPost-Processing in a Video Terminal using Motion Vectors. 1986. I.B.C.Brighton 1986.

2. Girod, B., Thoma, R. Motion-compensating conversion without loss ofvertical resolution line-interlaced television systems. Eurasip Workshopon `Coding of HDTV Signals`, L'Aquila, November 1986.

3. Storey, R. HDTV Motion Adaptive Bandwidth Reduction using DATV. BBCResearch Department Report 1986/5. British Patent Application No. 8531777.

4. Storey, R. Compatible Transmission of HDTV in a 625 line Channel.British Patent Specification No., 86 20110.

5. Thomas, G. A. Bandwidth Reduction by Adaptive Subsampling and MotionCompensation DATV Techniques. October 1986. 128th SMPTE TechnicalConference, Oct. 24-29 1986, New York. British Patent Applications 8606809 and 86 17320.

6. Ninomiya, Y. et al. 1984. A Single Channel HDTV Broadcast System, TheMUSE. NHK Laboratory Note No. 304.

SUMMARY OF THE INVENTION

It is possible to effect bandwidth reduction of a still picture bysub-sampling, transmitting different sets of sub-sampling points in acyclic sequence of fields, e.g. a sequence of four fields, and buildingup the picture at the receiver by accumulating the points from eachcycle of fields. The bandwidth reduction is achieved essentially bytemporal filtering. Further reduction may be effected by transmittingonly some points in this way and creating the others at the receiver byinterpolation.

The temporal filtering procedure cannot be applied to a moving picturearea as it would blur the image. It has already been proposed to switchto a low spatial detail filter to provide the compressed bandwidthsignal for moving areas (Reference 3). However it is desirable to beable to transmit a moving area which maintains good correlation fromfield to field with high detail. The loss of resolution when a lowspatial detail filter is used is very obvious.

The object of the present invention is to make it possible to effectbandwidth reduction in a manner such that correlated detail of movingareas is not lost.

BRIEF DESCRIPTION OF THE DRAWING

The invention is defined with particularity in the appended claims andwill now be described in detail, by way of example, with reference tothe accompanying drawings, in which:

FIG. 1 represents a basic sample structure used in a bandwidth reductionsystem;

FIG. 2 represents reconstructing a detailed image in stationary areas:

FIG. 3a represents an effective sampling lattice used in moving areas toact as a low spatial detail filter;

FIG. 3b represents the required prefilter characteristics;

FIG. 4 represents the interpolation of velocity vectors when vectors aremeasured across a picture period;

FIG. 5 represents the effect of block size on the number of samples in ablock;

FIG. 6 represents a simple two-dimensional linear interpolator;

FIG. 7 represents reconstructing a detailed image using motion vectorinformation;

FIG. 8 represents the reconstruction process shown for a onedimensionalcase;

FIG. 9 represents the problem of obscured background and how `followingback` helps;

FIG. 10 is a block diagram of a complete transmission system embodyingthe present invention;

FIG. 11 is a block diagram of a second embodiment of the invention;

FIG. 12 and 13 show prefilter characteristics for the system of FIG. 11;

FIG. 14a and 14b show sampling lattices for the embodiment of FIG. 11;

FIG. 15 illustrates the action of a simple interlace-to-sequentialconverter; and

FIGS. 16 to 19 illustrate the spectra of sampled objects moving atvarious velocities.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

In order to investigate motion adaptive bandwidth reduction systems,some hardware has recently been built (Ref. 1). This equipment containstwo digital filters which filter an incoming monochrome 625 line TVsignal in two different ways; one optimal for moving areas of thepicture (a spatial filter) and one optimal for stationary ones (atemporal filter). Both types of filter reduce the bandwidth by a factorof 4 and hence allow the filtered signal to be resampled on a coarsersampling grid (FIG. 1) which has a 4 field repeat sequence. In FIG. 1, adot represents a site which is not sampled. The numerals 1 to 4represent the sites which are sampled in fields 1 to 4 respectively of afour-field repeat sequence. This method allows a highly detailed pictureto be transmitted in stationary areas, as samples taken over a 4 fieldperiod can be accumulated in the receiver to build up a detailed image.This is illustrated in FIG. 2 in which the upper part is an "end-on"view of samples X coming into the receiver in successive fields n-3,n-2, n-1 and n, while the lower part shows the samples accumulated inthe output picture. The dots are the unsampled sites (see FIG. 1) andthese intervening points are interpolated in the receiver in a mannerwell known per se.

In moving areas this technique cannot be used (since allowing samples toaccumulate over a period of time would blur any moving object), so theoutput from the low spatial detail filter gives a better representationof the image. The signal from this filter can be fully represented bythe samples taken in one field (FIG. 3a). FIG. 3B shows thecharacteristics required of a prefilter. The broken lines show theNyquist limits in vertical and horizontal frequencies for a fullsequential system. The hatched triangle denote the prefilter passband.The hardware selects which of the two transmission methods is mostappropriate on a block-by-block basis, a block being shaped like adiamond about six pixels across. Information on which transmission modehas been selected for each block is sent to the receiver as a digitalsignal, so that it knows how to perform the reconstruction. We refer tothis generally as DATV, meaning digitally-assisted television.

One major advantage of this bandwidth reduction method is that thebandwidth reduced signal can be viewed on a `simple` receiver (without aframestore and having only a simple spatial interpolator) to give arecognisable picture. The only significant spuriae would be aliassing inareas sent in the `high detail` mode since they have not beenprefiltered in a way that enables the signal to be represented by onlyone field of samples. This produces an effect a bit like interlinetwitter horizontally and vertically, at a quarter of the field rate.Further, if a picture containing, say, 1249 lines was coded in this way,it is possible to transmit the samples in such a way as to make thesignal `look like` a 625 line signal.

This bandwidth reduction system has performed surprisingly well, butsuffered from one major disadvantage, namely that the loss of resolutionin areas of well correlated movement (that the eye can easily follow)was very obvious. In order to maintain a high spatial resolution inareas of well correlated motion, it is necessary, in accordance with thepresent invention, to be able to measure the speed and direction ofmovement of such motion, and move the subsampling lattice in such a wayas tomake the moving object appear stationary as far as the samplingstructure is concerned.

There are thus two problems to be solved-first, a good technique ofmeasuring motion vectors must be found, and secondly the details of howto use this information to improve the transmission system must beworked out.

The problem of finding a good motion measurement technique has been thesubject of a study involving computer simulation of promisingtechniques. The results of these investigations are described elsewhere(Ref. 5).

This invention is concerned with the second problem, namely theapplication of motion vector information to bandwidth reduction systemsusing moving sampling structures. Algorithms investigated and resultsobtained are described.

The technique of motion vector measurement described in Ref. 5 involvesperforming a phase correlation between fairly large blocks in twosuccessive pictures, and extracting the dominant vectors present byhunting for peaks in the correlation surface. Each pixel (or smallblock) in the picture is then assigned one of these vectors, on thebasis of which vector gives the best match when the pixel (or block) inone picture is compared to the corresponding area in the other picturebut displaced by the motion vector. Given below are the details of howthis technique has so far been applied to the bandwidth reductionproblem discussed here.

Each odd field of the input picture is divided into measurement blocksof 64 pixels by 32 lines (other similar sizes would probably workreasonably well, but it is convenient if they are powers of 2 since FastFourier Transforms are used to perform the correlation). A phasecorrelation (see Ref. 5) is performed between corresponding blocks insuccessive odd fields, and the correlation surface (without any priorinterpolation) is searched to find the three largest peaks. The regionof search can be limited to an area of the correlation surfacecorresponding to a `sensible` range of velocities; however if themeasurement blocks are only 64 pixels wide and the measurement isperformed across a picture period, then the full range of +/-32 pixelsper picture period is probably requiredc (equivalent to a maximumvelocity of about a second per picture width).

The vector measurement and assignment is performed across a pictureperiod rather than a field period for several reasons:

(a) this enables the correlation and matching processes to have accessto samples taken from the same points spatially, which means thatstationary objects with a high amount of vertical detail do not appearto be moving.

(b) all movements are twice the size that they would be if measurementswere made across a field period, which actually enables them to bemeasured more accurately (on an absolute basis) and enables similarvelocity vectors to be distinguished more easily,

(c) since the vectors for the intermediate fields are interpolated fromthe `picture based` vectors, there is only half as much vectorinformation to transmit to the receiver,

(d) the amount of vector measurement processing at the transmitter ishalved.

The only penalty for this approach is that objects which acceleratesignificantly in a picture period will not be correclty tracked.However, this is a relatively rare occurrence. FIG. 4 shows how thevelocity vector interpolation is performed. Given that the sum, of themotion over two fields is equal to the movement over one picture, wehave

    M.sub.n =V.sub.n-1,2 +V.sub.n,1.

Moreover the vectors for field 2 are the average of the field 1 vectorson either side, so that

    V.sub.n,2 =1/2(v.sub.n,1 +V.sub.n+1,1).

It should be noted that this interpolation technique cannot be used whenthe input signal comes, say, from a telecine, where both fields 1 and 2refer to the same instant of time. In this situation, the motion vectorsfor field 2 would always be zero.

Each little diamond shaped transmission block is assigned one of thevectors measured in the correlation process. The list of `menu vectors`from which a vector is selected is formed from the (maximum of) threevectors measured in the measurement block containing the transmissionblock (the `central` block), along with the vectors measured in theimmediately adjacent measuring blocks. Thus a maximum of 27 vectorscould be tried. However, there is little point in trying several verysimilar vectors (the velocity of a large object will probably have beenmeasured in several adjacent blocks) so vectors in neighbouring blocksare only included if they differ from vectors in the central block bymore than a specified amount. A minimum difference value of 0.2 pixelsper picture period was found to be a reasonable figure. Also, it wasfound that the accuracy of vector assignment in some areas of thepicture could be improved by measuring the sum of the modulus pixeldifference across an area slightly larger than that of the decisionblock (areas roughly one pixel bigger all round were tried).

As discussed in Ref. 5, the accuracy of the vector assignment processcan be improved if the modulus error summed over a transmission block ismultipled by a weighting factor that increases with the magnitude of thevector. This means that where two vectors fit an area almost equallywell, there will be a bias towards selecting the smaller vector. Thusareas devoid of significant detail tend to get assigned the same smallvector all over rather than a mixture of vectors of various sizes.Future investigatons will look at the possibility of other improvementsto the vector assignment process, such as reassigning a vector if ablock has a different vector from all its neighbours.

The overall accuracy that is required for the motion vector measurementprocess can be estimated by considering the way in which the picture issampled and reconstructed. Four successive fields of information arecombined to make one output picture, so if there is a measurement errorof one pixel over that period of time then there will be a significantdrop in the amplitude of a frequency having one cycle in two pixelperiods (the Nyquist limit). Hence motion vectors should be measured toan accuracy better than about a quarter of a pixel per field period, orhalf a pixel per picture period. The phase correlation techniquediscussed above is capable of accuracies of about 0.2 pixels per pictureperiod without anything cleverer than quadratic interpolation betweenpoints on the correlation surface. Hence this technique should be easilyadequate for this application.

Although the motion vector measurement technique of Ref. 5 has beenfound to be very effective, there is no reason whyo ther techniquescould not be used in the bandwidth reduction system of this invention,as long as their performance reaches the figures discussed above.

As mentioned earlier, the input picture should be subjected to a milddiagonal spatial filter and a vertical-temporal filter prior tosampling. In practice there are so few frequencies in real pictures withhigh diagonal components that this filter can be dispensed with for testpurposes without having any significant effect on the results. This wasdone with the hardware described above, and has also been done in thecomputer simulations described here.

The vertical-temporal filter needs to be `motion compensated` when usedwith moving sample structures since picture material moving with theassigned velocity vector must be considered to be statinary. This meansthat the locations of input samples to the filter must be displaced bythe sum of the motion between the field containing the input samples andthe `central` field. This means that sub-pixel interpolation isrequired, since the motion vectors are known to sub-pixel accuracy. Inpractice, however, this interpolation can be performed by changing thevalues of the filter coefficients slightly. Each transmission block mustbe filtered separately since it has its own motion vector and its ownmotion history.

In order to transmit a moving object in the `high detail` mode, it isnecessary for the sampling structure to move with the object. The reasonfor this can be readily understood in terms of the current European 625line interlaced TV system. Interlace is optimal for objects which do notmove vertically; in such a situation it is possible to reconstruct afull 625 line sequential picture of the object (although motioncompensation techniques would still be needed if there is any horizontalmovement). As soon as the object starts to mvoe vertically, it is nolonger being properly sampled. At a speed of movement of one pictureline per field period, the same sites on the object are being sampledevery field, and the scan has become effectively 312 lines sequential.If, however, the scanning raster were locked to the object, it wouldalways be possible to reconstruct a highly detailed sequential image.The situation with the bandwidth reduction system under discussion hereis very similar, except that the sampling structure is effectivelyinterlaced in the horizontal direction as well.

Each little diamond shaped transmission block must be considered to haveits own sample structure which moves according to the motion vectors forthat block. This immediately places a restriction on the size and shapeof the transmission blocks, since in order to maintain a constant datarate we must always transmit a fixed number of samples per block. AsFIG. 5 shows, a diamond shaped block A 6 pixels wide could contain 1, 2or 4 samples depending on the relative positon of the block and samplestructure. The only reasonable block sizes that satisfy the `fixednumber of samples` criterion are square or diamonds 4 or 8 pixels wide.Diamonds B 8 pixels wide were choosen for this investigation, as thissize roughly matches the apertures of the spatial filters. The advantageof using diamond rather than square blocks is that since many, possiblymost, objects have horizontal or vertical edges, there are less problemswith edges of objects moving out of many blocks simultaneously andmaking the block structure more obvious. There would probably beadvantage (from the point of view of picture quality) if smaller blockswere used, but this would increase the bit rate of the digital part ofthe transmission link unacceptably. Indeed, it may become necessary touse larger blocks to limit the bit rate further.

The `first` set of samples to be taken in a transmission block are sitedas shown for field 1 in FIG. 1. The next set of samples are taken at thepositions shown for the sample sites in field 2, but shifted by themotion vector assigned to the block. Where samples are required atpoints between input sample sites, a simple linear interpolator was used(FIG. 6). This is bound to limit the spatial frequency response somewhat(especially vertically if the source is interlaced) but the use of amore sophisticated interpolator would help. Subjectively, theimprovement made by upgrading to an interpolator based on cubic splinefitting was marginal, but the processing time required increasedsignificantly.

Initial investigations allowed this process to continue `ad infinitum`,so that the sample structure could wander indefinitely.

This approach has several drawbacks:

(a) A moving object could come to rest such that the sample structurewas displaced by one picture line from its `natural resting place`. Thiswould mean that samples were alway being interpolated between two fieldlines, resulting in poor vertical resolution. This problem would be muchless acute with a sequential source.

(b) The sample structure of two adjacent blocks could move it oppositedirections, resulting in a permanent gap being left in the samplestructure as a whole.

(c) Any transmission error in the digital motion vector informationwould cause the receiver to lose track of the correct sample structurelocation, since the location of every sample depends on the completemotion history of the block.

In order to avoid these problems, a method was devised where theposition of the sample structure is reset to its `field 1` position atthe start of every 4 field period. Thus the position of the structure isnevery more than 3 motion vectors away from where it started. However,this means that each set of 4 fields must be dealt with in isolation,and the reconstruction process must look `forwards` in time, as well as`backwards.` This is discussed in more detail below.

When the sample structure moves by more than 4 pixels horizontally orvertically, it is of course necessary to let `new` sample sites moveinto the block as others `fall off` the edge. As discussed above, theblock size was chosen so that new sample sites always come in as othersfall out, maintaining a constant number of samples per block.

We now consider the process required to reconstruct a completesequential picture from four fields' worth of subsamples.

Each transmission block is reconstructed separately. The reconstructionprocess can be though of as placing the samples from each of the 4subsample phases in appropriate places in a `framestore` in such a waytaht at the end of the process, half the locations in the framestore arefilled with samples, forming a quincunxial pattern. The other half ofthe locations are then interpolated using a two dimensional spatialinterpolator, as in the stationary picture case of FIG. 2.

Each output picture is formed from the 4 fields of samples containingthe current `phase` (see FIG. 7). The numerals 1, 2, 3, 4, 1 . . .represent the fields of the subsample sequence. The first fields 1 arealways sampled at fixed sites. The field to field motion vectors V₁₂,V₂₃ and V₃₄ are symbolized by arrows. The vectors V₄₁ are not used.Then, by way of example, the shifts which are applied to reconstructfield 3 analagously to FIG. 2 are shown. For example, field 2 is shiftedby -V₂₃. One set of samples will thus not need to be moved by any motionvector since they refer to the same time as the current output picture.The other three sets need to be moved by an amount equal to the sum ofthe motion vectors between the time when the samples were taken and thetime corresponding to the output picture. Since the sites that weresampled were offset by the same sum of motion vectors, the finalarrangement of samples after the reconstruction process is guaranteed tobe quincunxial, like the lattice shown in FIG. 1. The reconstructionprocess is illustrated for a one dimensional case in FIG. 8 with thesample sites of fields 1-4 illustrated at the top in the absence ofmotion. A moving object is represented by a bar R in fields 1-4 with theshifted sampling points marked 1, 2, 3, 4 on each bar. Then the vectorshifts of FIG. 7 are shown applied to form the reconstructed image RRfor field 3, containing all four sampling points.

The exact location at which the samples end up is unlike to becoincident with the set of quincunxial sites available, since thelocation of the sites depends on the position of the subsample latticefor the corresponding input field. Hence each block in the final pictureconstructed in the framestore at the receiver could be out of positionby up to half the spacing between quincunxial sites. This means that thesub-pixel interpolation is needed when reading out of the framestore, inorder to make up for the distance between where the samples `wanted` togo and the fixed sites available for them. The sub-pixel shift requiredis different for each block, since the length of the shift depends onthe position of the sample structure at this point in time.

Problems can arise if each block in each phase is allowed to contributeto the output picture. We do not want an area of the picture which isvisible in one phase of the sample structure but has been obscured in alater phae reappearing when the later picture is reconstructed. Thus itis necessary to work out which areas of the picture have become`obscured`, and not use samples from these areas in the reconstructionprocess. This problem is solved by `following back` the vectors for eacharea of the output picture and using the picture information located`along the way` in the reconstruction process. This means that each areaof the output picture can have not more than four sets of samplescontributing to it, and picture information that has become obscured isnot used. FIG. 9 illustrates the potential problem and the way that thissolution works. An example of the sort of picture information that callsfor this special treatment is the edge of an object in the foregroundmoving fairly rapidly over the background. Without this `follow back`technique, the leading edge of the object could fail to be reconstructedproperly by the `high detail` mode, and hence would revert to the `lowdetail` transmission method. It may be that the extra sophisticationrequired in the receiver to implement this algorithm is not warranted bythe marginal improvement in quality obtained.

In FIG. 9 the broken vertical lines represent block boundaries.

Blocks SO pertain to a stationary object against a moving background MV,whose movement vectors are the diagonal arrows. Blocks marked a couldend up at the site a' and the block b could end up at site b' usingsimple reconstruction techniques in reconstructing field 4. This meansthat obscured background intrudes into the stationary object. If,however, each arrow in the final phase is followed backwards and onlyblocks found `on the way` are used, blocks a and b will not be used togenerate the output picture. The same ideas apply when reconstructingusing `future` as well as `past` fields.

In this example, the motion vectors have been chosen to be exactly oneblock width per field period for simplicity.

Problems can also arise at the junction between blocks if the motionvectors of the blocks point away from each other. In many cases, thesample structures move in such a way as to leave a few sites in theframestore unfilled. If the framestore was filled with zero samplesprior to the reconstruction process, this results in little grey dotsappearing in the output picture. This problem can be avoided by firstfilling the framestore with an image derived only from the current inputsample phase, using a spatial interpolator to fill in the missing 75% ofthe samples. Thus if any quincunxially positioned sites in theframestore are not filled by the reconstruction process, the picturedata visible in these sites will be from a sort of `low detail` versionof the current field. This effectively means that the system can revertto the low detail mode on a pixel by pixel basis in an area of revealedbackground, without the whole of the transmission block switching to lowdetail mode. The `low detail` field used to fill the framestore is not atrue low detail version of the input picture, since it is derived from aset of samples of a picture which has not been prefiltered with asuitable spatial filter. This means that there will be some aliassng indetailed areas, but since we only need this information to fil in a fewpixels, this is not of major importance.

We now briefly describe the `fallback` low detail transmission mode,which is the same as that used in Ref. 3. If the motion measurement andmoving subsampling algorithms are working well, this mode should only beneeded in areas of the picture containing erratic motion of uncoveredbackground.

The input picture is prefiltered and subsampled as shown in FIG. 3. Thesubsample lattice is `fixed` in the sense that it follows the 4 fieldsequence of FIG. 1 and does not move to follow objects as the `highdetail mode` sampling lattice does. An output picture is reconstructedusing a spatial interpolator, producing a field that can be comparedwith that obtained by the `high detail` mode as described below.

The method used in this investigation to select which transmission modeis the most appropriate for each block is the same as the method used inthe hardware described earlier (Ref. 3). For completeness this method isdescribed briefly here.

For each transmission block, the modulus difference between the originalsignal and the signal after passing through both the `low` and `high`spatial detail modes is calculated. The sums of the errors across theblock area are compared after multiplying by a weighting factor, and themethod giving the smallest error is used to transmit the block. Theweighting factor used for these investigations was 0.55 for errors inthe low detail mode, and 0.45 for errors in the high detail mode. Thisbiases the mode selection slightly in favour of the high detail mode, asthis was found to give the best subjective impression.

An experiment was tried whereby the area of the picture over which theerrors were summed was increased to be slightly larger than the area ofa transmission block. Although this gave a marginal improvement in someareas of the picture, further investigations are needed to find out ifthis technique gives a more reliable mode selection technique.

Once a mode has been selected for a transmission block, the subsamplescorresponding to that mode are transmitted. However, the samples in theframestore in the `detail` part of the transmission system are notchanged if the low detail mode is selected. This means that the outputof the high detail part of the system at the transmitter will not beexactly the same as the signal decoded at the receiver in this made ifthere has been a mode change in this set of four fields. The reason fornot changing the samples is that the high detail mode will then be `selfconsistent` and will not be corrupted just because one field did notgive a good representation of the signal. This should guard against thesystem getting `stuck` in the low detail mode. Also, since thereconstruction technique looks both forwards and backwards in time, itis not possible to update the samples at the transmitter in a sensibleway, since it would be necessary to know which fields were transmittedin which way before the reconstruction process was started.

The simulation work that has been performed to date has shown that theperformance of the basic adaptive subsampling system can besignificantly enhanced by the addition of motion vector information.Work on the picture sequence from which FIG. 3 of Ref. 3 is taken, forexample, has shown that the car can be transmitted in the high spatialdetail mode with minimal impairments, whereas without motion vectorinformation it reverts to the low detail mode. The moving gate in thissequence can happily be transmitted in the low detail mode, however,since it is moving sufficiently fast (about 2.4 pixels per field period)that there is little spatial detail on it anyway.

There are still some minor impairments present in the output pictures.For example, the edges of the car appear slightly odd, probably becausethe corresponding transmission blocks contain two sorts of movement.Initial investigations did not use a vertical-temporal prefilter in the`detail` path, and it is possible that the inclusion of such a filter(motion compensated of course) may reduce the amplitude of pictureinformation which is not being correctly followed by the motionmeasurement system. This may reduce the amplitude of vertical lineswhich tend to appear in areas not moving in a well defined way. Theappearance of such spuriae could also be reduced by increasing the sizesof the various filter apertures to enable them to have sharper cutoffs.

There is scope for optimising parameters, such as the aperture sizesused in the vector assignment and mode assignment stages, the `highvelocity` weighting factor, the `mode selection` weighting factor, andvarious details of the vector measurement stage. Thought also needs tobe given to the dimensions of the measuring and transmission blocks, thenumber of `menu` vectors required, and the resolution required for thevectors. Many of these parameters will be dependant on the availablebandwidth in the digital part of the transmission link.

We have thus described a method of applying motion vector measurement toa video bandwidth reduction system based on motion adaptive subsampling.FIG. 10 shows a block diagram of the complete transmission system.Computer simulation of the technique has produced promising results,showing a significant improvement over the performance of the original(non-vector) system. Already the technique appears to be good enough toform the basis of a high definition television bandwidth reductionsystem, and more work should improve its performance further.

No detailed description of the hardware is necessary since the system isbuilt up from such well known items as filters and interpolators. Themotion vector generator 10 is constructed as described in reference 5.The motion vectors MV are used to control the positions of the samplingpoints in the sub-sampling unit 12 which follows the pre-filters 13 andserves the high detail mode whether there is movement or not (zerovector) and provides high detail sub-sampled picture data SSH. Thesampling points are moved essentially by shifting the sampling times.

High detail reconstruction 14 is implemented as desribed above toconvert SSH back to high detail reconstructed video RVH.

A prefilter 16 and fixed lattice sub-sampling unit 18 provide low detailsub-sampled (spatially filtered) picture data SSL. Spatial interpolator20 effects low detail reconstruction to convert SSL back to low detailreconstructed video RVL.

A mode selector 22 compares both RVH and RVL with VIDEO IN (suitablydelayed) and controls a switch 24 which selects between SSH and SSL asthe reduced-bandwidth analogue signal to be transmitted (recorded). Thedata transmitted comprises the analogue signal plus a digital signalcarrying both the motion vectors MV and the mode signal indicatingwhether a block is being sent as SSH of SSL.

At the receiver (playback machine) the decoder comprises high detailreconstruction 14' and low detail reconstruction 20' constructed asblocks 14 and 20 in the coder. The high detail unit 14' uses thereceived motion vectors as described above and a switch 26 is controlledby the received mode signal to select between RVH' and RVL' as the videooutput signal fed to the display.

The bandwidth reduction system described offers a reduction of a factorof about 4 with an interlaced input signal. The system is capable ofsupporting the full resolution offered by sequential sources anddisplays, and provides a bandwidth reduction factor of 8 with suchsignals. The transmitted signal can be packaged to look like a signalwith a fewer number of lines, and with minimal processing could bedisplayed on a simple receiver to give a reasonable picture.

In the embodiment desribed above, the sampling structure movement issuch that the sampling structure is `reset` every 4 fields to itsoriginal starting position, and is moved during the following threefields to follow any movement. This has the effect of sampling the samepoints on a moving object that would be sampled if the object werestationary.

It is possible to obtain essentially the same sample values if thesamples are all taken during the first picture in the four fieldsequence (assuming perfect motion measurement and allowing for anydifferences in the prefilters), and the other three fields are notsampled at all. Information in the other fields is still used in orderto generate a sequential picture out of the first interlaced field; thisprocess uses motion vector information.

The advantage of this approach is that any errors in the motion vectormeasurement process will no longer manifest themselves by the appearanceof the subsampling structure on the picture. Errors will instead appearas slight discontinuities at block edges; this will probably be lessannoying subjectively.

Using this approach, the transmission system appears as a 121/2 Hzsequential system, with sample shuffling for the purposes of compatibletransmission, and frame rate up-conversion at the receiver, aided bymotion vector information sent in the digital assistance channel. Thelow detail `fallback` mode is still present, of course.

Transparent interconversion between source and display standards, usingmotion compensation (Reference 1), suggests a method of bandwidthreduction. Such a method may be suitable for the transmission of HDTV.It would, however, require the use of a digital assistance channel. Itturns out that this system is very similar to the method suggested inReferences 3, 4 and 5. By viewing this method from a differentperspective, the considerable differences from the NHK MUSE system(Reference 6) become more apparent. Furthermore, this viewpoint suggestspossible improvements to the method described above.

The basis of this method is motion compensated temporal subsampling.This reduces the signal redundancy due to temporal frequencies causedsolely by moving objects. Picture quality is preserved by transmittingthis information, in the form of motion vectors, at much reducedbandwidth, as a digital assistance channel.

The following description concentrates on a `detailed` picture channel.This channel carries information from stationary and well correlatedmoving areas. As in references 3 to 5 and in FIG. 10 there would also bea low detail channel. This would be used when the detailed picturechannel failed due to poorly correlated image content. In the absence ofan HDTV standard the method of described for a conventional 625 linesource. The method can, of course, be translated for use with an HDTVsource.

The first part of channel (shown in FIG. 11) is a motion compensatedstandards converter 30. This would produce an output at 625/12 1/2/1:1from an input at 625/50/2:1 (video) or 625/25/1:1 (film). If, as islikely, the input were from interlaced video, this converter would haveto perform an implicit interlace to sequential conversion.

This is followed by a spatial (diagonal) prefilter 32, which permits theuse of 2:1 spatial subsampling in sub-sampler 34. Appropriate filteringresults in little reduction of picture quality, as described in theabove Reference 3, 4 and 5.

A practical implementation of this system would combine these first twoblocks. The combined filter/standards converter would require no morehardwave than the pre-filter only.

The spatio-temporal filter used is shown in FIGS. 12 and 13. FIG. 12shows the pre-filter used in the detail channel. The filter shown is fora stationary object; it would be skewed appropriately for a movingobject. The fixed pre-filter used in the low detail channel is shown inFIG. 13.

FIG. 14a shows the sampling lattice used for the high detail channel,and repeated at 121/2 Hz. FIG. 14b shows the four field sequence ofsampling points used at 50 Hz for the low detail channel.

The temporarily subsampled output from the standards converter wouldthen be shuffled in shuffler 36 to resemble a (313/50/2:1) pseudoquincunxial signal. In an HDTV system the subsampled signal could beshuffled to resemble a compatible 625/50//2:1 signal (reference 4). Theshuffling would be done by displacing objects, using motion vectors, sothat they appeared in the right place at the right time. No sub-pixelinterpolation is performed. Displacement vectors, used for shiftingobjects, rounded to the nearest pixel vertically and horizontally.

The pseudo quincunxial signal can then be transmitted, with theassociated motion information transmitted in a digital assistancechannel.

The received signal is unshuffled in un-shuffler 40 and interpolated ininterpolator 42 to produced a 625/12 1/2/1:1 signal. This signal is thenup-coverted, using a motion compensated standards converter 44, to therequired display standard. Again in a practical system the interpolationfilter 42 and standards converter 44 would be combined.

The ultimate display standard is of little relevance to the actualtransmission of the signal. This means that while the obvious displaystandard may be the same as the source, other display standards areequally valid. For example, the pictures could be displayed at625/10p0/2:1; all the information required for a (notional) upconversion is present in the digital assistance data. The pictures couldeven conceivably be displayed on a 60 Hz standard. It is interesting tonote that the motion portrayal of a film source would automatically beimproved by the use of this bandwidth reduction system.

This system achieves a bandwidth reduction by a factor of 4:1. Thisapproach to bandwidth reduction may be less susceptible to small errorsin the measurement of motion vectors. Transmission via a pseudoquincunxial signal is for compatibility with transmission of a lowdetail signal. This latter mode of transmission would be used in areasof poorly correlated or erratic motion.

The hardware requirements for a simple version of this system arecomparable with tose for the system of reference 5 and FIG. 10 above.Indeed much of the hardware is identical.

A simple version of the transitter standards converter could be builtinitially. It would simply interpolate the `missing` lines in field 1(of a 4 field sequence) from field 2, using motion vectos. Fields 3 and4 would be discarded. These fields would, however, be used for motionmeasurement. At a later date a more sophisticated converter, with morehardware andusing fields 3 and 4, could be built. This should improvesystem performance.

A simple version of the output standards converter could also be built.It would displace moving objects to their correct position in the outputfield using motion information. Sub-pixel interpolation would be used togive the best output picture. At a later date a more sophisticatedstandards converter could be used to improve performance.

There are two main improvements to be gained by the use of betterstandards converters. Firstly they should reduce the noise on the outputpictures. This is due to their narrower (motion compensated) temporalbandwidth. Second they should reduce temporal aliasing arising otherthan from the motion of objects. This is achieved by appropriate (motioncompensated) pre and post filtering.

Interlace to sequential conversion is a process of interpolation. Themissing intermediate lines in one field must be interpolated fromadjacent lines in time and space. A straightforward approach is tointerpolate the missing lines in one field from adjacent fields (beforeand/or after), allowing for the motion of objects. This is illustratedin FIG. 15, in which vector V represents the velocity of a movingobject. Unfortunately this introduces spatial alias components. Thesearise because the (sampled) adjacent fields are interpolated and thenshifted. Ideally the adjacent fields would be shifted before they weresampled, but this is impossible. Motion compensated conversion may be animprovement on fixed verticaltemporal filtering, because it producesspatial rather than spatio-temporal aliasing.

The problems inherent in interlace to sequential conversion can beappreciated by considering the frequency domain. Ideally the interlacedpicture would be padded with zeros and filtered using an appropriate lowpass aperture. FIGS. 16, 17, 18 and 19 show the spectra of a sampledstationary object, an object moving at 25s/ph, an object moving at12s/ph and an object moving at 6.25s/ph (625/25/1:1). The dashed arrowin FIG. 16 corresponds to motion at 25s/ph.objects moving at variousvelocities. The dotted regions in FIG. 16 indicate appropriate motioncompensated low pass filter apertures. These regions are shown as`square` but this does not affect the arguments.

FIG. 17 shows that even at low speeds at appropriate low passinterpolation aperture can only occupy half the available bandwidth.This means that as the vertical speed of an object increases, the motioncompensated interpolator is less and less able to cope with changes inshape etc. of moving objects. By the time the speed of an object reachesa slow 12s/ph (FIG. 18), a motion compensated interpolator producesunavoidable spatial aliasing.

These arguments suggest that even a motion compensated interpolatorcannot perform transparent interlace to sequential conversion of otherthan very slow moving pictures.

It may be considered that an equal bandwidth sequential source is moreappropriate for motion compensated picture processing than an interlacedone. If an interlaced source is used, the results are likely to bemarkedly worse than for a sequential one.

The discussion in this section has considered an `ideal` source. That isone which contains the maximum possible signal content. Such a source isa respectable target to aim for with HDTV. Current sources, however, donot provide the maximum possible signal content. This means thatimpairments caused by interlace to sequential conversion, will be lesssevere with current sources than with the ideal sources described above.

We claim:
 1. Receiving or playback apparatus for a reduced bandwidthvideo signal accompanied by a digital signal carrying motion vectorinformation pertaining individually to a plurality of blocks of pixels,characterized by means for accumulating sub-sample points from thefields of each of a repeating cycle of fields, and means responsive tothe motion vector information to shift the sub-sample points inaccordance with the corresponding motion vector information so that theaccumulated points provide a high-definition picture both in stationaryareas and moving areas which are correlated from field to field. 2.Apparatus according to claim 1, characterised by means for interpolatingintermediate points among the accumulated points.
 3. Apparatus accordingto claim 1, wherein the accumulated points of each field are derivedfrom both preceding and succeeding fields of the same cycle of fields.4. Apparatus according to claim 1, wherein the accumulated points ofeach field are derived only from that field and preceding fields. 5.Apparatus according to claim 1, further comprising means forreconstructing a picture by spatial interpolation of the sub-samplepoints received in one field only and means responsive to a mode signalaccompanying the video signal to select for output between the picturereconstructed by accumulation of points and the picture reconstructed byspatial interpolation.
 6. Apparatus according to claim 5, wherein theselecting means switch abruptly at the junction between two blocks sentin different modes.
 7. Apparatus according to claim 5, wherein theselecting means switch gradually at the junction between two blocks sentin different modes.
 8. Apparatus according to claim 1, comprising aframestore in which the high-definition picture is reconstructed andmeans for filing the framestore with an image generated by simpleinterpolation from the sub-sample points of a single field, beforereconstruction by accumulation of points take place.
 9. Apparatusaccording to claim 1, characterized by means for tracking motion vectorsback from the field being reconstructed and for causing sub-samplepoints not to be included in the accumulation thereof when they pertainto a preceding or succeeding field and to a motion vector not reached bythe tracking from the field being reconstructed.
 10. Apparatus accordingto claim 1, comprising a following motion compensated standardsconverter responsive to the motion vectors.
 11. Apparatus for producinga reduced bandwidth video signal from an input video signal,characterized by means for generating motion vectors describing themovement of each of a plurality of blocks of pixels making up a picture,means for sub-sampling the input video signals to provide in each fieldof each of a repeating cycle of fields a corresponding sub-set ofpicture points in correct spatial relationship to the points of theother fields of the same cycle, and means providing a composite signalcomprising the video signal created by the sub-sampling means and adigital signal conveying the motion vectors.
 12. Apparatus according toclaim 11, wherein the sub-sampling means take all sub-sets of picturepoints for one cycle from the first field of that cycle.
 13. Apparatusaccording to claim 11, wherein the sub-sampling means take each sub-setof picture points from the corresponding field, utilizing a samplinglattice displaced in accordance with the motion vectors describing thedisplacement of each block.
 14. Apparatus according to claim 13, whereinthe sampling lattice is restored to a datum position for the first fieldof each cycle.
 15. Apparatus according to claim 11, comprising means forproducing an alternative reduced bandwidth signal by field to fieldsub-sampling implementing spatial filtering, and means for selectingblock by block between the two reduced bandwidth signals to utilize thatsignal which best matches the input video signal.
 16. Apparatusaccording to claim 15, wherein the selecting means comprise means forreconstituting a high-definition signal from the first-mentioned reducedbandwidth signal, by accumulating subsample points from the fields ofeach cycle, shifted in accordance with the motion vectors, means forreconstituting a low-definition signal from the alternative reducedbandwidth signal by spatial interpolation and means for comparing eachreconstituted signal with the input video signal to determine whichprovides the better match.
 17. Apparatus according to claim 16,whereinthe comparing means performs a weighted comparison favouring thehigh-definition signal.
 18. Apparatus according to claim 15, comprisinga spatial pre-filter preceding the field to field sub-sampling means.19. Apparatus according to claim 15, wherein the composite signalincludes a mode signal indicating which reduced bandwidth signal isselected.
 20. Apparatus according to claim 11, wherein the sub-samplingmeans are preceded by a vertical-temporal or temporal filter whose inputsamples are so displaced by the motion vectors that each moving block isfiltered as if it were stationary.
 21. Apparatus according to claim 11,wherein the sample structure is quincunxial and repeats over a fourfield cycle.
 22. Apparatus according to claim 11, wherein the inputvideo signal is an interlaced signal, the motion vectors are measuredfrom picture to picture and these motion vectors are interpolated toprovide field to field motion vectors.
 23. Apparatus according to claim11, wherein the means for generating motion vectors implement phasecorrelation between blocks of two successive images to extract dominantvectors as peaks in a correlation surface and each pixel or block ofpixels is then assigned that one of the motion vectors which producesthe match from image to image.
 24. Apparatus according to claim 23,wherein the phase correlation is applied to large blocks and the vectorassignment is to small blocks.
 25. Apparatus according to claim 11,comprising a preliminary motion compensated standards converterresponsive to the motion vectors.